Getting Started

This document provides the code and output from Time Series I class. This is a living document and may be updated throughout the semester (when this occurs, you will be notified that it has been updated). First, here is a list of all the libraries that you will need:

library(tseries)
library(forecast)
library(haven)
library(fma)
library(expsmooth)
library(lmtest)
library(zoo)
library(seasonal)
library(ggplot2)
library(seasonalview)
library(aTSA)
library(imputeTS)
library(prophet)

The data sets that you will need are as follows (be sure to put the correct location and file name for your computer):

#file.dir = "..." 
input.file1 = "usairlines.csv"
input.file2 = "steel.csv"
input.file3 = "leadyear.csv"
input.file4 = "ebay9899.csv"
input.file5 = "fpp_insurance.csv" 
input.file6 = "ar2.csv"
input.file7 = "MA2.csv"
input.file8 = "hurrican.csv"

# Reads the data at specified directory
# If the file directory is incorrect, then this won't run
USAirlines = read.table(paste(file.dir, input.file1,sep = ""),sep=",", header=T)
Steel = read.table(paste(file.dir, input.file2, sep = ""),sep=",",header=T)
Lead.Year = read.table(paste(file.dir, input.file3, sep = ""),sep=",",header=T)
Ebay = read.table(paste(file.dir, input.file4, sep = ""),sep=",",header=T)
Quotes= read.table(paste(file.dir, input.file5, sep = ""),sep=",",header=T)
Y= read.table(paste(file.dir, input.file6, sep = ""),sep=",",header=T)
x=read.table(paste(file.dir, input.file7, sep = ""),sep=",",header=T)
hurricane=read.table(paste(file.dir, input.file8, sep = ""),sep=",",header=T)

For many time series applications, you will need a time series object in R. This is created using the function ts. For example, the time series data set in the airlines data frame is in the column “passengers”. Let’s go ahead and create the time series object for this data set and graph it.

Passenger <- ts(USAirlines$Passengers, start = 1990, frequency =12)

autoplot(Passenger)+labs(title="Time Series plot for Passengers", x="Date",y="Passengers")

Within the ts command, the only required argument is the vector of data that contains the time series information (in this case USAirlines$Passengers). The optional arguments of “start” is for nice plotting purposes (it has the correct time frame when it plots instead of just using 1,2,3 for time). The last argument shown is “frequency”, which is for seasonal data. If your time series object has a seasonality to it, then you should specify the length of the season (it does not know this unless you provide it). For furture analysis, we will need to create the time series objects for Steel.

SteelShp <- ts(Steel$steelshp, start = 1984, frequency = 12)

Time series decomposition

IF your time series has a seasonal component to it, a useful visualization is the decomposition. We will be using the STL decomposition (which can only do the additive decomposition, NOT multiplicative!). The following code creates the decomposition and then plots it:

# Time Series Decomposition ...STL#
decomp_stl <- stl(Passenger, s.window = 7)

# Plot the individual components of the time series
plot(decomp_stl)

autoplot(decomp_stl)

You can pull off the different components (Seasonal, Trend or Remainder).

head(decomp_stl$time.series)
##            seasonal    trend remainder
## Jan 1990 -4526.7610 39081.77 -207.0131
## Feb 1990 -5827.5592 38942.75  420.8128
## Mar 1990   560.4986 38803.72 1213.7829
## Apr 1990  -802.2312 38664.69  404.5406
## May 1990   139.2095 38533.15 -423.3574
## Jun 1990  2953.8857 38401.61 -563.4910

Which means we can overlay the original data with the trend component (which is the second column.)

autoplot(Passenger)+geom_line(aes(y=decomp_stl$time.series[,2]),color="blue")

Notice that the trend component is VERY similar to the “seasonally adjusted” data! Do you know what the difference between the two series is?

seas_adj=Passenger-decomp_stl$time.series[,1]

autoplot(Passenger) +
  geom_line(aes(y=decomp_stl$time.series[,2]),color="blue") +
  geom_line(aes(y=seas_adj),color="orange")

Another interesting plot is the subseries plot. This looks at the individual series (in this case, the series for January, the series for February, etc….).

# Plot seasonal subseries by months
ggsubseriesplot(Passenger)

Just a quick note. STL ONLY does additive seasonal decomposition. There is a decompose library that will do both additive AND multiplicative decomposition.

Self-study

Here is something to get you started if you want to take a look at the X13 decomposition!

decomp_x13=seas(Passenger)
summary(decomp_x13)
## 
## Call:
## seas(x = Passenger)
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## Leap Year          1.610e+03  4.003e+02   4.021 5.79e-05 ***
## Weekday            5.586e+01  1.890e+01   2.956  0.00312 ** 
## LS1992.Jun         4.006e+03  8.439e+02   4.747 2.07e-06 ***
## LS1992.Oct        -3.511e+03  8.409e+02  -4.176 2.97e-05 ***
## LS2001.Sep        -1.580e+04  8.633e+02 -18.296  < 2e-16 ***
## LS2001.Nov         5.492e+03  8.629e+02   6.365 1.96e-10 ***
## AO2002.Dec         4.412e+03  8.402e+02   5.251 1.51e-07 ***
## MA-Nonseasonal-01  5.073e-01  5.991e-02   8.468  < 2e-16 ***
## MA-Seasonal-12     4.981e-01  6.133e-02   8.121 4.62e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## SEATS adj.  ARIMA: (0 1 1)(0 1 1)  Obs.: 219  Transform: none
## AICc:  3501, BIC:  3533  QS (no seasonality in final):    0  
## Box-Ljung (no autocorr.): 22.77   Shapiro (normality): 0.9914  
## Messages generated by X-13:
## Warnings:
## - At least one visually significant trading day peak has been
##   found in one or more of the estimated spectra.
## Neat R shiny application....run OUTSIDE of RMarkdown
#view(decomp_x13)

Exponential Smoothing Models

In R, we are able to do Simple Exponential Smoothing Models, Holt Exponential Smoothing Models and Holt-Winters Exponential Smoothing Models. For the first two (Simple and Holt), we will be using the Steel data set and for the last one (Holt-Winters), we will be using the airline data set (we will also use the airline data set to illustrate the damped trend model). Each of these are shown below.

Simple Exponential Smoothing

For Simple Exponential Smoothing Models (SES), we have only one component, referred to as the level component.

Y^t+1=LtLt=θYt+(1θ)Lt1

This is basically a weighted average with the last observation and the last predicted value. Since this only has a level component, forecasts from SES models will be a horizontal line (hence why this method is called “one-step ahead” forecasting).

In the R code, you can choose how the initial values are selected. If you specify simple, then the first few observations will be used to estimate the starting value. If you select optimal, then the algorithm uses the ets (will be discussed later) to optimize the starting values and the smoothing parameters. You can also specify the value for “h”, which is the number of forecasts to create (take a look at the forecast…do you see a horizontal line?).

# Building a Single Exponential Smoothing (SES) Model - Steel Data #
SES.Steel <- ses(SteelShp, initial = "simple", h = 24)
summary(SES.Steel)
## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = SteelShp, h = 24, initial = "simple") 
## 
##   Smoothing parameters:
##     alpha = 0.4549 
## 
##   Initial states:
##     l = 5980 
## 
##   sigma:  460.4357
## Error measures:
##                    ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 11.43866 460.4357 363.9341 -0.2204828 5.708307 0.8287599
##                     ACF1
## Training set -0.04379112
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 1992       6479.571 5889.499 7069.643 5577.133 7382.008
## Feb 1992       6479.571 5831.305 7127.836 5488.134 7471.007
## Mar 1992       6479.571 5777.922 7181.219 5406.492 7552.649
## Apr 1992       6479.571 5728.323 7230.819 5330.636 7628.505
## May 1992       6479.571 5681.801 7277.340 5259.487 7699.654
## Jun 1992       6479.571 5637.847 7321.295 5192.265 7766.876
## Jul 1992       6479.571 5596.077 7363.065 5128.383 7830.758
## Aug 1992       6479.571 5556.194 7402.947 5067.388 7891.754
## Sep 1992       6479.571 5517.964 7441.177 5008.920 7950.221
## Oct 1992       6479.571 5481.197 7477.944 4952.690 8006.452
## Nov 1992       6479.571 5445.736 7513.405 4898.458 8060.684
## Dec 1992       6479.571 5411.453 7547.689 4846.025 8113.116
## Jan 1993       6479.571 5378.236 7580.906 4795.224 8163.917
## Feb 1993       6479.571 5345.992 7613.150 4745.911 8213.230
## Mar 1993       6479.571 5314.640 7644.502 4697.962 8261.179
## Apr 1993       6479.571 5284.110 7675.032 4651.271 8307.871
## May 1993       6479.571 5254.340 7704.801 4605.742 8353.399
## Jun 1993       6479.571 5225.277 7733.864 4561.294 8397.847
## Jul 1993       6479.571 5196.872 7762.269 4517.853 8441.289
## Aug 1993       6479.571 5169.083 7790.058 4475.353 8483.789
## Sep 1993       6479.571 5141.871 7817.271 4433.735 8525.406
## Oct 1993       6479.571 5115.201 7843.940 4392.948 8566.194
## Nov 1993       6479.571 5089.043 7870.098 4352.942 8606.199
## Dec 1993       6479.571 5063.368 7895.773 4313.676 8645.465
# Plot the SES model on steel data
autoplot(SES.Steel)+
  autolayer(fitted(SES.Steel),series="Fitted")+ylab("US Steel Shipments") + geom_vline(xintercept = 1992,color="orange",linetype="dashed")

# Computes accuracy statistics for SES model on steel data (training data...NOT validation nor test)
round(accuracy(SES.Steel),2) 
##                 ME   RMSE    MAE   MPE MAPE MASE  ACF1
## Training set 11.44 460.44 363.93 -0.22 5.71 0.83 -0.04

Holt ESM

The Holt model incorporates trend information. So, now there are two components: level and trend. For each component, there will be a smoothing coefficient (or weight). CAREFUL, when you look at parameter estimates, these are NOT the estimates for the mean nor the linear trend…you should be thinking of them as weights (between 0 and 1). The overall form for Holt’s method is:

Y^t+h=Lt+hTtLt=θYt+(1θ)(Lt1+Tt1)Tt=γ(LtLt1)+(1γ)Tt1

For the Holt’s method, when you forecast, you will see a trending line.

# Building a Linear Exponential Smoothing Model - Steel Data #
LES.Steel <- holt(SteelShp, initial = "optimal", h = 24)
summary(LES.Steel)
## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = SteelShp, h = 24, initial = "optimal") 
## 
##   Smoothing parameters:
##     alpha = 0.4329 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 6678.9989 
##     b = -0.0651 
## 
##   sigma:  471.4322
## 
##      AIC     AICc      BIC 
## 1626.001 1626.667 1638.822 
## 
## Error measures:
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -4.167318 461.5062 369.9177 -0.4760441 5.818476 0.8423858
##                     ACF1
## Training set -0.03556298
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 1992       6495.463 5891.298 7099.627 5571.472 7419.453
## Feb 1992       6495.357 5836.979 7153.736 5488.455 7502.260
## Mar 1992       6495.252 5786.775 7203.730 5411.729 7578.775
## Apr 1992       6495.147 5739.865 7250.429 5340.043 7650.251
## May 1992       6495.042 5695.672 7294.412 5272.511 7717.573
## Jun 1992       6494.937 5653.768 7336.106 5208.479 7781.395
## Jul 1992       6494.832 5613.826 7375.838 5147.450 7842.214
## Aug 1992       6494.727 5575.592 7413.861 5089.032 7900.421
## Sep 1992       6494.622 5538.862 7450.381 5032.914 7956.330
## Oct 1992       6494.516 5503.468 7485.565 4978.839 8010.194
## Nov 1992       6494.411 5469.273 7519.549 4926.598 8062.225
## Dec 1992       6494.306 5436.161 7552.452 4876.013 8112.600
## Jan 1993       6494.201 5404.033 7584.369 4826.933 8161.470
## Feb 1993       6494.096 5372.805 7615.387 4779.229 8208.963
## Mar 1993       6493.991 5342.404 7645.578 4732.791 8255.191
## Apr 1993       6493.886 5312.767 7675.005 4687.520 8300.252
## May 1993       6493.781 5283.837 7703.725 4643.331 8344.230
## Jun 1993       6493.675 5255.565 7731.786 4600.149 8387.202
## Jul 1993       6493.570 5227.907 7759.234 4557.905 8429.235
## Aug 1993       6493.465 5200.824 7786.106 4516.541 8470.389
## Sep 1993       6493.360 5174.281 7812.439 4476.003 8510.718
## Oct 1993       6493.255 5148.246 7838.264 4436.241 8550.269
## Nov 1993       6493.150 5122.689 7863.611 4397.211 8589.089
## Dec 1993       6493.045 5097.585 7888.504 4358.874 8627.216
# Plote the LES model on steel data

autoplot(LES.Steel)+
  autolayer(fitted(LES.Steel),series="Fitted")+labs(title="US Steel Shipment with Holt forecasts",y="US Steel Shipments") + geom_vline(xintercept = 1992,color="orange",linetype="dashed")

As mentioned, we can perform Holt’s method, however assuming we have a damped trend. You will see the formula for the damped trend is similar to the previous Holt formula with an addition of a dampening parameter.

Y^t+h=Lt+ikϕiTtLt=θYt+(1θ)(Lt1+ϕTt1)Tt=γ(LtLt1)+(1γ)ϕTt1

We will illustrate the damped trend on both the Steel and Airline data sets.

LDES.Steel <- holt(SteelShp, initial = "optimal", h = 24, damped = TRUE)
summary(LDES.Steel)
## 
## Forecast method: Damped Holt's method
## 
## Model Information:
## Damped Holt's method 
## 
## Call:
##  holt(y = SteelShp, h = 24, damped = TRUE, initial = "optimal") 
## 
##   Smoothing parameters:
##     alpha = 0.4202 
##     beta  = 1e-04 
##     phi   = 0.9083 
## 
##   Initial states:
##     l = 6692.5352 
##     b = -54.4181 
## 
##   sigma:  472.702
## 
##      AIC     AICc      BIC 
## 1627.468 1628.412 1642.854 
## 
## Error measures:
##                    ME     RMSE     MAE        MPE     MAPE     MASE        ACF1
## Training set 8.669068 460.2275 367.178 -0.2689588 5.765085 0.836147 -0.02936686
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 1992       6504.538 5898.746 7110.330 5578.059 7431.017
## Feb 1992       6504.482 5847.350 7161.614 5499.485 7509.479
## Mar 1992       6504.432 5799.671 7209.192 5426.594 7582.269
## Apr 1992       6504.386 5755.003 7253.768 5358.304 7650.467
## May 1992       6504.344 5712.838 7295.850 5293.839 7714.849
## Jun 1992       6504.306 5672.796 7335.817 5232.621 7775.992
## Jul 1992       6504.272 5634.585 7373.958 5174.201 7834.342
## Aug 1992       6504.241 5597.976 7410.505 5118.229 7890.252
## Sep 1992       6504.212 5562.783 7445.642 5064.420 7944.005
## Oct 1992       6504.187 5528.852 7479.521 5012.541 7995.832
## Nov 1992       6504.163 5496.057 7512.269 4962.398 8045.928
## Dec 1992       6504.142 5464.292 7543.992 4913.829 8094.455
## Jan 1993       6504.123 5433.465 7574.780 4866.693 8141.552
## Feb 1993       6504.105 5403.498 7604.712 4820.871 8187.339
## Mar 1993       6504.089 5374.322 7633.856 4776.260 8231.919
## Apr 1993       6504.075 5345.879 7662.271 4732.767 8275.383
## May 1993       6504.062 5318.115 7690.008 4690.313 8317.810
## Jun 1993       6504.050 5290.985 7717.114 4648.827 8359.272
## Jul 1993       6504.039 5264.447 7743.631 4608.247 8399.831
## Aug 1993       6504.029 5238.464 7769.594 4568.514 8439.544
## Sep 1993       6504.020 5213.002 7795.038 4529.578 8478.462
## Oct 1993       6504.012 5188.032 7819.992 4491.394 8516.630
## Nov 1993       6504.005 5163.526 7844.483 4453.919 8554.090
## Dec 1993       6503.998 5139.459 7868.537 4417.116 8590.880
autoplot(LDES.Steel)+
  autolayer(fitted(LDES.Steel),series="Fitted")+labs(title="US Steel Shipment Linear Damped ESM Forecast") + geom_vline(xintercept = 1992,color="orange",linetype="dashed")

LDES.USAir <- holt(Passenger, initial = "optimal", h = 24, damped = TRUE)
summary(LDES.USAir)
## 
## Forecast method: Damped Holt's method
## 
## Model Information:
## Damped Holt's method 
## 
## Call:
##  holt(y = Passenger, h = 24, damped = TRUE, initial = "optimal") 
## 
##   Smoothing parameters:
##     alpha = 0.5721 
##     beta  = 1e-04 
##     phi   = 0.8057 
## 
##   Initial states:
##     l = 36886.1093 
##     b = 527.1535 
## 
##   sigma:  4874.3
## 
##      AIC     AICc      BIC 
## 4906.527 4906.924 4926.862 
## 
## Error measures:
##                    ME     RMSE      MAE        MPE     MAPE     MASE       ACF1
## Training set 196.2283 4818.336 3529.079 -0.2896149 7.191284 1.326257 0.04535328
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Apr 2008       63673.96 57427.30 69920.63 54120.51 73227.42
## May 2008       63674.11 56477.18 70871.04 52667.35 74680.87
## Jun 2008       63674.23 55638.45 71710.01 51384.57 75963.89
## Jul 2008       63674.33 54879.22 72469.44 50223.37 77125.29
## Aug 2008       63674.41 54180.40 73168.41 49154.58 78194.23
## Sep 2008       63674.47 53529.54 73819.40 48159.13 79189.80
## Oct 2008       63674.52 52917.92 74431.12 47223.71 80125.32
## Nov 2008       63674.56 52339.20 75009.92 46338.63 81010.49
## Dec 2008       63674.59 51788.59 75560.59 45496.53 81852.66
## Jan 2009       63674.62 51262.36 76086.88 44691.70 82657.53
## Feb 2009       63674.64 50757.52 76591.76 43919.61 83429.67
## Mar 2009       63674.66 50271.67 77077.65 43176.55 84172.76
## Apr 2009       63674.67 49802.80 77546.54 42459.48 84889.86
## May 2009       63674.68 49349.27 78000.09 41765.85 85583.51
## Jun 2009       63674.69 48909.65 78439.73 41093.51 86255.87
## Jul 2009       63674.70 48482.74 78866.66 40440.60 86908.80
## Aug 2009       63674.70 48067.49 79281.91 39805.54 87543.87
## Sep 2009       63674.71 47663.01 79686.40 39186.93 88162.48
## Oct 2009       63674.71 47268.49 80080.93 38583.57 88765.85
## Nov 2009       63674.71 46883.24 80466.19 37994.37 89355.06
## Dec 2009       63674.72 46506.63 80842.80 37418.39 89931.04
## Jan 2010       63674.72 46138.10 81211.34 36854.78 90494.66
## Feb 2010       63674.72 45777.16 81572.28 36302.76 91046.68
## Mar 2010       63674.72 45423.35 81926.09 35761.66 91587.78
autoplot(LDES.USAir)+
  autolayer(fitted(LDES.USAir),series="Fitted")+labs(title="US Airline Passengers with Linear Damped ESM Forecast") + geom_vline(xintercept = 2008.25,color="orange",linetype="dashed")

Holt-Winters

The Holt-Winters (HW) model has three components to it (level, trend and seasonality). Seasonality is an interesting component to model since we can have an additive seasonal component or a multiplicative seasonal component. Both models are shown below:

Additive HW

Y^t+h=Lt+hTt+Stp+hLt=θ(YtStp)+(1θ)(Lt1+Tt1)Tt=γ(LtLt1)+(1γ)Tt1St=δ(YtLt1Tt1)+(1δ)Stp

Multiplicative HW

Y^t+h=(Lt+hTt)Stp+hLt=θYtStp+(1θ)(Lt1+Tt1)Tt=γ(LtLt1)+(1γ)Tt1St=δYtLt1+Tt1+(1δ)Stp

Where p is the frequency of the seasonality (i.e. how many “seasons” there are within one year).

# Building a Holt-Winters ESM - US Airlines Data - Additive Seasonality
HWES.USAir <- hw(Passenger, seasonal = "additive")
summary(HWES.USAir)
## 
## Forecast method: Holt-Winters' additive method
## 
## Model Information:
## Holt-Winters' additive method 
## 
## Call:
##  hw(y = Passenger, seasonal = "additive") 
## 
##   Smoothing parameters:
##     alpha = 0.5967 
##     beta  = 1e-04 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 38384.1383 
##     b = 171.3139 
##     s = -1607.408 -2735.133 -266.9438 -4488.589 6259.346 6626.648
##            4166.006 1369.769 250.523 2806.828 -6741.171 -5639.874
## 
##   sigma:  1949.79
## 
##      AIC     AICc      BIC 
## 4515.651 4518.696 4573.265 
## 
## Error measures:
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -84.80235 1877.214 1168.093 -0.2917412 2.495749 0.4389788
##                    ACF1
## Training set 0.06636172
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Apr 2008       65005.38 62506.62 67504.13 61183.86 68826.90
## May 2008       66294.00 63384.09 69203.92 61843.67 70744.34
## Jun 2008       69259.75 65989.86 72529.64 64258.88 74260.62
## Jul 2008       71890.05 68295.96 75484.15 66393.36 77386.74
## Aug 2008       71692.11 67800.64 75583.59 65740.62 77643.61
## Sep 2008       61113.61 56945.83 65281.39 54739.54 67487.68
## Oct 2008       65504.72 61077.76 69931.67 58734.27 72275.16
## Nov 2008       63206.01 58534.16 67877.87 56061.02 70351.00
## Dec 2008       64502.98 59598.36 69407.60 57002.01 72003.95
## Jan 2009       60640.36 55513.46 65767.26 52799.44 68481.28
## Feb 2009       59708.43 54368.43 65048.44 51541.60 67875.27
## Mar 2009       69425.67 63880.68 74970.66 60945.34 77906.01
## Apr 2009       67038.85 61296.05 72781.64 58255.99 75821.70
## May 2009       68327.47 62393.46 74261.49 59252.18 77402.76
## Jun 2009       71293.22 65173.90 77412.54 61934.53 80651.91
## Jul 2009       73923.52 67624.29 80222.75 64289.67 83557.37
## Aug 2009       73725.58 67251.38 80199.79 63824.14 83627.03
## Sep 2009       63147.08 56502.45 69791.72 52984.99 73309.17
## Oct 2009       67538.18 60727.34 74349.03 57121.89 77954.48
## Nov 2009       65239.48 58266.32 72212.64 54574.96 75904.01
## Dec 2009       66536.45 59404.62 73668.27 55629.26 77443.63
## Jan 2010       62673.83 55386.73 69960.92 51529.18 73818.47
## Feb 2010       61741.90 54302.73 69181.07 50364.67 73119.13
## Mar 2010       71459.14 63870.89 79047.39 59853.92 83064.36
autoplot(HWES.USAir)+
  autolayer(fitted(HWES.USAir),series="Fitted")+ylab("Airlines Passengers")+ geom_vline(xintercept = 2008.25,color="orange",linetype="dashed")

# Building a Holt-Winters ESM - US Airlines Data - Multiplicative Seasonality
HWES.USAir <- hw(Passenger, seasonal = "multiplicative")
summary(HWES.USAir)
## 
## Forecast method: Holt-Winters' multiplicative method
## 
## Model Information:
## Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = Passenger, seasonal = "multiplicative") 
## 
##   Smoothing parameters:
##     alpha = 0.4372 
##     beta  = 1e-04 
##     gamma = 0.2075 
## 
##   Initial states:
##     l = 38293.1221 
##     b = 173.9926 
##     s = 0.9658 0.962 1.0064 0.9745 1.1393 1.0801
##            1.0368 0.9994 1.0012 1.0401 0.8799 0.9146
## 
##   sigma:  0.0381
## 
##      AIC     AICc      BIC 
## 4504.228 4507.272 4561.842 
## 
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE      ACF1
## Training set -113.1889 1848.797 1090.105 -0.383246 2.303162 0.4096702 0.1713934
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Apr 2008       65528.49 62329.64 68727.34 60636.28 70420.70
## May 2008       66821.29 63262.13 70380.46 61378.02 72264.57
## Jun 2008       70075.52 66056.65 74094.40 63929.19 76221.86
## Jul 2008       73144.30 68671.62 77616.98 66303.93 79984.68
## Aug 2008       71313.29 66698.66 75927.92 64255.83 78370.76
## Sep 2008       58655.19 54662.55 62647.83 52548.97 64761.40
## Oct 2008       64544.10 59944.79 69143.41 57510.06 71578.13
## Nov 2008       62554.65 57906.98 67202.32 55446.66 69662.65
## Dec 2008       63629.25 58716.73 68541.77 56116.20 71142.30
## Jan 2009       59379.95 54629.79 64130.10 52115.21 66644.68
## Feb 2009       57852.76 53069.42 62636.10 50537.28 65168.25
## Mar 2009       70682.78 64655.56 76710.00 61464.94 79900.62
## Apr 2009       67606.25 61473.18 73739.32 58226.52 76985.97
## May 2009       68934.51 62519.04 75349.97 59122.89 78746.12
## Jun 2009       72285.87 65393.58 79178.16 61745.02 82826.72
## Jul 2009       75445.45 68084.46 82806.44 64187.79 86703.11
## Aug 2009       73551.02 66215.90 80886.14 62332.92 84769.12
## Sep 2009       60490.96 54330.83 66651.08 51069.86 69912.06
## Oct 2009       66558.97 59644.04 73473.89 55983.50 77134.43
## Nov 2009       64502.39 57671.58 71333.21 54055.56 74949.22
## Dec 2009       65605.36 58528.93 72681.80 54782.90 76427.83
## Jan 2010       61219.37 54498.44 67940.31 50940.59 71498.16
## Feb 2010       59640.30 52980.57 66300.04 49455.12 69825.49
## Mar 2010       72861.19 64590.90 81131.47 60212.88 85509.49
autoplot(HWES.USAir)+
  autolayer(fitted(HWES.USAir),series="Fitted")+ylab("Airlines Passengers")+ geom_vline(xintercept = 2008.25,color="orange",linetype="dashed")

Evaluating forecasts

In order to get a better idea of the forecasting properties of the algorithms, it is best to divide your data into a training data set and a test data set. Time series is VERY different than other algorithms in which you have done. The test data set should come at the END of the time series (to truly see how well you can forecast!). An example code is shown below in which the last 12 observations are used as the test data set:

# Create training set from overall Airlines Data
training=subset(Passenger,end=length(Passenger)-12)

# Create test set from overall Airlines Data
test=subset(Passenger,start=length(Passenger)-11)

# Fit Holt-Winters ESM (multiplicative seasonality) on training data
HWES.USAir.train <- hw(training, seasonal = "multiplicative",initial='optimal',h=12)


# Calculate prediction errors from forecast
error=test-HWES.USAir.train$mean

# Calculate prediction error statistics (MAE and MAPE)
MAE=mean(abs(error))
MAPE=mean(abs(error)/abs(test))

MAE
## [1] 1134.58
MAPE
## [1] 0.01763593

ETS

You can also allow the computer to search for the best model. The ETS (Error, Trend, Seasonality) algorithm will search for the best model and estimate the paramaters. For the error term, we can have either an additive or multiplicative error structure. For the trend, we can have none, additive, multiplicative, damped additive or damped multiplicative . For the seasonal component, we can none, additive or multiplicative (lots of choices!). An example of how to run this is:

ets.passenger<-ets(training)
summary(ets.passenger)
## ETS(M,Ad,M) 
## 
## Call:
##  ets(y = training) 
## 
##   Smoothing parameters:
##     alpha = 0.6485 
##     beta  = 1e-04 
##     gamma = 1e-04 
##     phi   = 0.9755 
## 
##   Initial states:
##     l = 38298.6991 
##     b = 99.6316 
##     s = 0.9696 0.9436 0.9968 0.919 1.1276 1.1289
##            1.0784 1.0223 1.0025 1.0531 0.8681 0.89
## 
##   sigma:  0.0369
## 
##      AIC     AICc      BIC 
## 4225.929 4229.567 4285.918 
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE    MAPE    MASE        ACF1
## Training set 157.1888 1747.817 1043.596 0.1980283 2.19083 0.38762 0.008869188
ets.forecast.passenger<-ets.passenger%>%forecast::forecast(h=12)
error=mean(abs(test-ets.forecast.passenger$mean))
error
## [1] 1153.042

ARIMA

We will now be switching over to doing ARIMA models!! There are a number of different concepts you will need in order to do this type of modeling.

Stationarity

Before we can try to model the dependency structure, we must first have a stationary distribution! The ADF test is one of the most well-known and accepted test for stationarity. There are several packages that will do this for you, however, below, I am focusing on the ADF test within the package aTSA.

Quotes.ts<-ts(Quotes$Quotes,start=2002,frequency=12)
autoplot(Quotes.ts)+labs(title="Time Series of Daily Stock quotes", x="Time", y="Quotes")

The following code produces output similar to the output seen in SAS (under the tau test).

# Perform the ADF test (k=0)
aTSA::adf.test(Quotes.ts)
## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag     ADF p.value
## [1,]   0 -0.3061   0.550
## [2,]   1 -0.5980   0.458
## [3,]   2 -0.0632   0.620
## [4,]   3 -0.0950   0.611
## Type 2: with drift no trend 
##      lag   ADF p.value
## [1,]   0 -2.66  0.0939
## [2,]   1 -3.42  0.0192
## [3,]   2 -2.45  0.1608
## [4,]   3 -2.36  0.1943
## Type 3: with drift and trend 
##      lag   ADF p.value
## [1,]   0 -2.62  0.3212
## [2,]   1 -3.36  0.0772
## [3,]   2 -2.41  0.4012
## [4,]   3 -2.29  0.4463
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01

Correlation Functions

The Acf and the Pacf in R will calculate the autocorrelation (up to the lag you specify) and the partial autocorrelation, respectively. You can either output these values to look at them or plot them (see code below).

acf1=Acf(Y, lag=10)$acf

pacf1=Pacf(Y, lag=10)$acf

index1=seq(1,length(pacf1))

all.dat=data.frame(cbind(acf1[2:11],pacf1,index1))
colnames(all.dat)=c("acf","pacf","index")

ggplot(all.dat,aes(x=factor(index),y=acf))+geom_col()+labs(x="Lags")

White noise

For residuals to exhibit white noise, they must “independent” and normally distributed with mean 0 and constant variance. You already know how to assess normality and constant variance, however, we need to focus on assessing “independent”. We can assess if there is significant dependence through the Ljung-Box test. The hypotheses being tested are

H0:NosignificantautocorrelationHA:Significantautocorrletion

This should be assessed on a stationary time series. Looking at a stationary time series, going back 10 lags should be sufficient (this will be different when we get to seasonal models). We would like for all of the p-values (for lags 1-10) to be insignificant. However, keep in mind that sample size will matter when assessing significance.

White.LB <- rep(NA, 10)
for(i in 1:10){
  White.LB[i] <- Box.test(Y, lag=i, type="Ljung-Box", fitdf = 0)$p.value
}

white.dat=data.frame(cbind(White.LB,index1))
colnames(white.dat)=c("pvalues","Lag")

ggplot(white.dat,aes(x=factor(Lag),y=pvalues))+geom_col()+labs(title="Ljung-Box test p-values",x="Lags",y="p-values")+coord_cartesian(ylim = c(0, 0.025))

####Fit appropriate model
Y.ARIMA=Arima(Y,order=c(2,0,0))
White.LB <- rep(NA, 10)
for(i in 3:10){
  White.LB[i] <- Box.test(Y.ARIMA$residuals, lag=i, type="Ljung-Box", fitdf = 2)$p.value
}

white.dat=data.frame(cbind(White.LB[3:10],index1[3:10]))
colnames(white.dat)=c("pvalues","Lag") 


ggplot(white.dat,aes(x=factor(Lag),y=pvalues))+geom_col()+labs(title="Ljung-Box test when there is white noise",x="Lags",y="p-values")

AutoRegressive Models (AR)

AutoRegressive (AR) models involve modeling the lags of Y. We can write an autoregressive model as

Yt=c+ϕ1Yt1+ϕ2Yt2+...ϕpYtp+ϵt
Where there are p lags of Y. Below is the code to fit an AR(2) model.

ggAcf(Y)

ggPacf(Y)

Y.ts <- ts(Y)
Y.ARIMA <- Arima(Y.ts, order=c(2,0,0))

ggAcf(Y.ARIMA$residuals)

ggPacf(Y.ARIMA$residuals)

Moving Average model (MA)

Moving average (MA) models involve modeling the lags of the error. We can write a moving average model as

Yt=c+θ1ϵt1+θ2ϵt2+...θqϵtq+ϵt
Where there are q lags of ϵ. Below is code to fit an MA(2) model.

ggAcf(x)

ggPacf(x)

x.ts <- ts(x)
x.ARIMA <- Arima(x.ts, order=c(0,0,2))
summary(x.ARIMA)
## Series: x.ts 
## ARIMA(0,0,2) with non-zero mean 
## 
## Coefficients:
##           ma1     ma2    mean
##       -0.2460  0.4772  0.0250
## s.e.   0.0857  0.0923  0.0567
## 
## sigma^2 estimated as 0.2207:  log likelihood=-65.1
## AIC=138.2   AICc=138.63   BIC=148.62
## 
## Training set error measures:
##                        ME      RMSE       MAE      MPE     MAPE      MASE
## Training set 0.0008828966 0.4627151 0.3808289 74.99115 114.4434 0.5453401
##                      ACF1
## Training set -0.002299708
ggAcf(x.ARIMA$residuals)

ggPacf(x.ARIMA$residuals)

Fitting ARIMA models

We can use an automatic procedure to help us find a model. We will be using the mean of the maximum velocity in the hurricane data set. This data also has some missing values which we need to look into first.

max.velocity=hurricane$MeanVMax

ggplot_na_distribution(max.velocity)+labs(y="Mean Max Velocity")

This is yearly data and the reason those values are missing is because there were no hurricanes recorded for that year. Since there is no trend (nor seasonality), I am going to remove these NA values and then run the Dickey-Fuller test.

max.velocity=na.omit(max.velocity)
hurrican.ts=ts(max.velocity)
aTSA::adf.test(hurrican.ts)
## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag     ADF p.value
## [1,]   0 -0.8242   0.384
## [2,]   1 -0.4391   0.517
## [3,]   2 -0.2585   0.569
## [4,]   3 -0.1254   0.607
## [5,]   4 -0.0692   0.623
## Type 2: with drift no trend 
##      lag    ADF p.value
## [1,]   0 -10.69    0.01
## [2,]   1  -7.69    0.01
## [3,]   2  -5.09    0.01
## [4,]   3  -4.09    0.01
## [5,]   4  -3.62    0.01
## Type 3: with drift and trend 
##      lag    ADF p.value
## [1,]   0 -11.37    0.01
## [2,]   1  -8.37    0.01
## [3,]   2  -5.70    0.01
## [4,]   3  -4.67    0.01
## [5,]   4  -4.23    0.01
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01

Series is stationary!! Let’s see if there is any significant dependecies here…

index1=seq(1,10)
White.LB <- rep(NA, 10)
for(i in 1:10){
  White.LB[i] <- Box.test(hurrican.ts, lag=i, type="Ljung-Box", fitdf = 0)$p.value
}

white.dat=data.frame(cbind(White.LB[1:10],index1[1:10]))
colnames(white.dat)=c("pvalues","Lag") 


ggplot(white.dat,aes(x=factor(Lag),y=pvalues))+geom_col()+labs(title="Hurricane data",x="Lags",y="p-values")

There is definitely something to be modeled here!! Let’s try an automated search first…

model1=auto.arima(hurrican.ts)
model2=auto.arima(hurrican.ts,d=0)

Let’s take a look at ACF and PACF plots and see how well we do manually..

ggAcf(hurrican.ts)

ggPacf(hurrican.ts)

Using the graphs and some trial and error, here was the model I chose…

model3=Arima(hurrican.ts,order=c(2,0,3))
summary(model3)
## Series: hurrican.ts 
## ARIMA(2,0,3) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2      ma1      ma2     ma3     mean
##       0.7921  0.1100  -0.7257  -0.1803  0.1578  91.4046
## s.e.  0.4161  0.3958   0.4094   0.3583  0.0791   1.8812
## 
## sigma^2 estimated as 94.76:  log likelihood=-569.79
## AIC=1153.58   AICc=1154.34   BIC=1174.88
## 
## Training set error measures:
##                      ME     RMSE      MAE       MPE    MAPE      MASE
## Training set 0.07215997 9.544078 7.471114 -1.024043 8.28476 0.7050813
##                       ACF1
## Training set -0.0003383851

Comparing the ACF and PACF plots for these models:

ggAcf(model1$residuals,lag.max = 10)

ggPacf(model1$residuals,lag.max = 10)

ggAcf(model2$residuals,lag.max = 10)

ggPacf(model2$residuals,lag.max = 10)

ggAcf(model3$residuals,lag.max = 10)

ggPacf(model3$residuals,lag.max = 10)

Let’s take a look at white noise for each model:

index1=seq(1,10)
White.LB <- rep(NA, 10)
for(i in 2:10){
  White.LB[i] <- Box.test(model1$residuals, lag=i, type="Ljung-Box", fitdf = 1)$p.value
}

white.dat=data.frame(cbind(White.LB[2:10],index1[2:10]))
colnames(white.dat)=c("pvalues","Lag") 


ggplot(white.dat,aes(x=factor(Lag),y=pvalues))+geom_col()+labs(title="Model 1",x="Lags",y="p-values")

White.LB <- rep(NA, 10)
for(i in 3:10){
  White.LB[i] <- Box.test(model2$residuals, lag=i, type="Ljung-Box", fitdf = 2)$p.value
}

white.dat=data.frame(cbind(White.LB[3:10],index1[3:10]))
colnames(white.dat)=c("pvalues","Lag") 


ggplot(white.dat,aes(x=factor(Lag),y=pvalues))+geom_col()+labs(title="Model 2",x="Lags",y="p-values")

White.LB <- rep(NA, 10)
for(i in 6:10){
  White.LB[i] <- Box.test(model3$residuals, lag=i, type="Ljung-Box", fitdf = 5)$p.value
}

white.dat=data.frame(cbind(White.LB[6:10],index1[6:10]))
colnames(white.dat)=c("pvalues","Lag") 


ggplot(white.dat,aes(x=factor(Lag),y=pvalues))+geom_col()+labs(title="Model 3",x="Lags",y="p-values")

ggplot(data =hurrican.ts, aes(x = model1$residuals)) +
    geom_histogram() +
    labs(title = 'Histogram of Residuals for Model 1', x = 'Residuals', y = 'Frequency')
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(data =hurrican.ts, aes(x = model2$residuals)) +
    geom_histogram() +
    labs(title = 'Histogram of Residuals for Model 2', x = 'Residuals', y = 'Frequency')
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(data =hurrican.ts, aes(x = model3$residuals)) +
    geom_histogram() +
    labs(title = 'Histogram of Residuals for Model 3', x = 'Residuals', y = 'Frequency')
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

You can now forecast the data with these models:

forecast::forecast(model1, h = 10)
##     Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 156       94.59774 82.06430 107.1312 75.42950 113.7660
## 157       94.59774 82.00783 107.1877 75.34314 113.8524
## 158       94.59774 81.95162 107.2439 75.25716 113.9383
## 159       94.59774 81.89565 107.2998 75.17157 114.0239
## 160       94.59774 81.83993 107.3556 75.08635 114.1091
## 161       94.59774 81.78445 107.4110 75.00151 114.1940
## 162       94.59774 81.72921 107.4663 74.91703 114.2785
## 163       94.59774 81.67421 107.5213 74.83291 114.3626
## 164       94.59774 81.61944 107.5760 74.74914 114.4463
## 165       94.59774 81.56490 107.6306 74.66573 114.5298
autoplot(forecast::forecast(model1, h = 10))

autoplot(forecast::forecast(model2, h = 10))

autoplot(forecast::forecast(model3, h = 10))

Seasonality

Seasonality is the component of the time series that represents the effects of the seasonal variation within the dataset. This seasonal effect can be thought of as repetitive behavior that occurs every S time periods. Here, S is the seasonal period that gets repeated every S units of time. Remember though that seasonal data is not stationary. This is because the series itself does not revert to an overall, constant mean.

Let’s explore the U.S. airline passenger data from 1990 to 2007.

First, we load the dataset into a time series object using the ts function. We use the start = option to define the starting point for our dataset, which is Janurary of 1990. The frequency = specifies the length of the seasonal period in our dataset. Our data is monthly with an annual seasonal pattern, so our frequency here is 12. From there we split our data into two pieces - training and testing - using the subset function. For the training set, we are subsetting the data to exclude the last 12 months. These last 12 months will be our test data set.

# Load the Data
Passenger <- ts(USAirlines$Passengers, start = 1990, frequency =12)

autoplot(Passenger) + labs(title="Time Series plot for Passengers", x="Date",y="Passengers")

# Create training set from overall Airlines Data
training <- subset(Passenger, end = length(Passenger)-12)

# Create test set from overall Airlines Data
test <- subset(Passenger, start = length(Passenger)-11)

Now, let’s explore our training data by looking at a time series decomposition. This time series decomposition breaks down the data into three pieces - season, trend, and error.

# Time Series Decomposition ...STL#
decomp_stl <- stl(Passenger, s.window = 7)

# Plot the individual components of the time series
plot(decomp_stl)

From the above plot we can see the annual season of our data as well as the upward trending pattern. We do have a shock to the system from the September 11 attacks. This impacted US airline travel for a number of months. We will have to account for this in our modeling as we go along. We can even see this shock in the error term of our decomposition as it resulted in large errors that were out of pattern of the rest of the data.

A great first place to start with any time series dataset is to build an exponential smoothing model. Exponential smoothing models are great first models as they are easy to build and relatively accurate. This allows us to set a baseline to try and beat with more sophisticated modeling approaches.

To build a Holt-Winters exponential smoothing model to account for both the trend and seasonality in our data, we use the hw function. With the seasonal = option we can specify whether our data has additive or multiplicative seasonality. The initial = option just specifies that the initial values of the HW model are optimized in the actual model building process as compared to just estimated with a simple calculation like the overall mean. The h = specifies that we want to forecast 12 time periods into the future on our training dataset.

# Fit Holt-Winters ESM (multiplicative seasonality) on training data
HWES.USAir.train <- hw(training, seasonal = "multiplicative", initial='optimal', h=12)

Now that we have our model, let’s plot the forecast as well as evaluate this forecast with our test dataset. The mean element in the ESM model object gives the 12 month forecast that we specified above. We can look at the difference between this forecast and our test dataset - what actually happened in those 12 months. From there we calculate the MAE and MAPE of this forecast. The autoplot function is used to visualize the forecast itself.

# Calculate prediction errors from forecast
HW.error <- test - HWES.USAir.train$mean

# Calculate prediction error statistics (MAE and MAPE)
HW.MAE <- mean(abs(HW.error))
HW.MAPE <- mean(abs(HW.error)/abs(test))*100

HW.MAE
## [1] 1134.58
HW.MAPE
## [1] 1.763593
autoplot(HWES.USAir.train)+
  autolayer(fitted(HWES.USAir.train),series="Fitted")+ylab("Airlines Passengers")+ geom_vline(xintercept = 2007.25,color="orange",linetype="dashed")

From the output above, we see that we have an MAE of 1134.58 and a MAPE of 1.76%. This gives us a good baseline to try and beat with further modeling approaches.

Seasonal Unit Root Testing

Exponential smoothing models do not require data to be stationary. However, ARIMA models do require stationary data for modeling. Since seasonal data is not stationary, we must account for this lack of stationarity in one of two broad categories of approaches - deterministic solutions and stochastic solutions.

Some examples of deterministic solutions to seasonality are seasonal dummy variables, Fourier transforms, and predictor variables. The stochastic solution to seasonality is taking a seasonal difference. Be careful, as the choice of these solutions will matter for modeling. This is why we need to evaluate which approach we need using seasonal unit root testing. Similar to trending data, unit root testing allows us to determine if differencing is the proper solution to make our seasonal data stationary. There are many tests for seasonal unit-root testing. Each test is a little different on what the null and alternative hypotheses are so we must be careful to know what we are looking it with results.

The nsdiffs function runs seasonal unit root tests and reports the number of seasonal differences that are needed for your dataset. If 0, that does not mean your data is stationary. It just means that your seasonality cannot be solved with differencing. If more than 0, then seasonal differencing is the solution to the seasonality problem of our data.

training %>% nsdiffs()
## [1] 1
cbind("Airlines Passengers" = training,
      "Annual change in Passengers" = diff(training, 12)) %>%
  autoplot(facets=TRUE) +
  xlab("Time") + ylab("") +
  ggtitle("Comparison of Differenced Data to Original")

We can see from the output above that we need one seasonal difference to account for the seasonality in our dataset. That means we need to solve our seasonality stochastically. The plot shows both the original data as well as the differenced data. This is not the end of our unit root testing however. There might still be a “regular” unit root in our data. Our data does not appear to be trending, but it still might contain a unity root. Again, we will want to test for whether our data requires an additional difference or whether it is stationary.

The ndiffs function applied to the differenced data will help with this. The diff function is used with a lag = 12 to take our seasonal difference.

training %>% diff(lag = 12) %>% ndiffs()
## [1] 0

From above we see that we require no more additional differencing. Since our data no longer appears to have a trend or season, we are ready to go to ARIMA modeling.

Deterministic Solutions

If our dataset required a deterministic solution instead of a stochastic one, how would we go about solving this? Again, our dataset does not need a deterministic solution, but let’s explore how we would solve seasonality deterministically for completion sake.

Unlike trend, there are many different approaches to accounting for seasonality deterministically. The approaches we will discuss are seasonal dummy variables, Fourier transforms, and predictor variables.

Seasonal Dummy Variables

Seasonal dummy variables are exactly what their name implies. We will use a set of dummy variables to account for the seasonal effects in our dataset. For a time series with S periods within a season, there will be S-1 dummy variables - one for each period (and one accounted for with the intercept). Since our dataset is monthly, we will build a dummy variable for each month and pick one to not include. For example, we could build the following model:

Yt=β0+β1JAN+β2FEB++β11NOV+et
In the above model, we chose to leave out the month of December’s dummy variable since our model has an intercept. In fact, the intercept β0 is the average effect of December. Each of the coefficients on the respective variables measures the average difference in effect from that month and December (our reference level).

There are a lot of ways to build month variables in a dataset. You could extract month from a time stamp variable for example. Here, we create our own month variable by repeating the values 1 to 12 across our dataset using the rep function. We then convert this to a factor with the factor function as well as set the reference level using the relevel function.

# Seasonal Dummy Variables
Month <- rep(0, length(training))
Month <- Month + 1:12
## Warning in Month + 1:12: longer object length is not a multiple of shorter
## object length
M <- factor(Month)
M <- relevel(M, ref="12")

By using an lm function we can see how building a linear regression with this factor variable would look. This is the exact same thing that is going on with the modeling in our data. If we were to use this factor variable in an ARIMA modeling function like auto.arima, the function will first build a linear regression and then evaluate which ARIMA model we need on the residuals from our model. Essentially, this “removes” the impact of the season from our dataset. The auto.arima function’s xreg = option is how we incorporate any external variables (here our dummy variables) into the model. The seasonal = FALSE option tells the function to not try and account for season itself since we are accounting for it with seasonal dummy variables.

Season.Lin <- lm(training ~ M)
summary(Season.Lin)
## 
## Call:
## lm(formula = training ~ M)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -16257.2  -6848.9    484.4   6368.3  14699.6 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  48761.5     2002.3  24.353   <2e-16 ***
## M1           -4461.9     2792.1  -1.598   0.1116    
## M2           -5433.9     2792.1  -1.946   0.0531 .  
## M3            4099.8     2792.1   1.468   0.1436    
## M4             814.9     2831.7   0.288   0.7738    
## M5            1951.8     2831.7   0.689   0.4915    
## M6            4844.6     2831.7   1.711   0.0887 .  
## M7            7504.9     2831.7   2.650   0.0087 ** 
## M8            7297.4     2831.7   2.577   0.0107 *  
## M9           -3242.5     2831.7  -1.145   0.2536    
## M10           1064.1     2831.7   0.376   0.7075    
## M11          -1268.2     2831.7  -0.448   0.6548    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8256 on 195 degrees of freedom
## Multiple R-squared:  0.2097, Adjusted R-squared:  0.1651 
## F-statistic: 4.703 on 11 and 195 DF,  p-value: 2.227e-06
M.Matrix <- model.matrix(~M)

Trend <- 1:length(training)

SD.ARIMA <- auto.arima(training, xreg = M.Matrix[,2:12], method="ML", seasonal = FALSE)
summary(SD.ARIMA)
## Series: training 
## Regression with ARIMA(1,1,1) errors 
## 
## Coefficients:
##          ar1      ma1     drift          M1         M2         M3         M4
##       0.4292  -0.7971  120.7148  -3947.9347  -5040.318  4373.0410  1776.4821
## s.e.  0.1142   0.0773   47.0832    485.1706    583.269   625.8909   653.3822
##              M5        M6         M7         M8         M9        M10
##       2774.5010  5539.126  8075.0713  7745.7280  -2913.554  1275.9094
## s.e.   664.7169   667.948   665.0425   655.1154    633.936   589.7366
##              M11
##       -1168.1437
## s.e.    486.9715
## 
## sigma^2 estimated as 3751114:  log likelihood=-1844.41
## AIC=3718.82   AICc=3721.34   BIC=3768.74
## 
## Training set error measures:
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -13.60989 1865.287 1119.089 -0.1822502 2.399032 0.4156601
##                      ACF1
## Training set -0.002860136

There are some advantages and disadvantages to the seasonal dummy variable approach. The advantages are that the model’s seasonal dummy variables have some nice interpretability. They tells us the average impact each seasonal component has on the target variable. It is also a rather straight-forward approach to implement. The main disadvantages however are that if your seasonal is especially long or complex, seasonal dummy variables are burdensome. Also, the constant effect of each season that is assumed may not be appropriate.

Fourier Transforms

Harmonic regression using Fourier transforms is another approach to account for seasonality deterministically. Fourier showed that a series of sine and cosine terms of the right frequencies approximate periodic patterns in a data series. To do this, we add Fourier variables to a regression model to account for the seasonal pattern. The odd terms k=1,3,5, etc. are accounted for with sine variables:

Xk,t=sin(k×2πtS)
The even terms k=2,4,6, etc. are accounted for with cosine variables:

Xk,t=cos(k×2πtS)
In fact, if you add the same number of Fourier variables as you have seasonal dummy variables, you will get the same predictions. However, we typically do not need all of the Fourier variables, especially with large values of S. We need to decide how many of these values we need. One way to approach this is to build many models, while increasing the number of Fourier terms in the models and compare them all via a metric like BIC.

Here, we loop through the first 6 Fourier terms to account for seasonality. We do this by adding the fourier function to the auto.arima function through the xreg = option. All of this is contained within a loop where we gradually increase the number of Fourier terms in the regression. The K = option in the fourier function does this. We loop through the first 6 Fourier terms (half of the seasonal length) and record the training BIC for each model. This BIC is reported on a plot with the 6 different forecasts (one from each model). The gridExtra package’s grid.arrange function allows the plots to all be put together.

plots <- list()
for (i in seq(6)) {
  fit <- auto.arima(training, xreg = fourier(training, K = i),
                    seasonal = FALSE, lambda = NULL)
  plots[[i]] <- autoplot(forecast::forecast(fit,
                                  xreg = fourier(training, K=i, h=12))) +
    xlab(paste("K=",i,"   BIC=",round(fit$bic,2))) +
    ylab("") + ylim(30000,80000)
}
gridExtra::grid.arrange(
  plots[[1]],plots[[2]],plots[[3]],
  plots[[4]],plots[[5]],plots[[6]], nrow=3)

From the above output we see that the best model had 6 Fourier terms. So we can build this model to account for the seasonality in our dataset.

F.ARIMA <- auto.arima(training, xreg = fourier(training, K = 6), seasonal = FALSE)
summary(F.ARIMA)
## Series: training 
## Regression with ARIMA(1,1,1) errors 
## 
## Coefficients:
##          ar1      ma1     drift       S1-12       C1-12     S2-12     C2-12
##       0.4289  -0.7970  120.6974  -1232.3084  -4334.8049  313.9197  677.9491
## s.e.  0.1142   0.0773   47.1030    270.2688    270.3805  194.1284  193.6296
##            S3-12      C3-12     S4-12     C4-12      S5-12     C5-12     C6-12
##       -2561.1131  1291.3900  413.8895  208.6503  2314.3804  274.0798  341.8763
## s.e.    152.7962   153.1557  130.4249  130.6610   118.9888  119.5082   81.8796
## 
## sigma^2 estimated as 3751119:  log likelihood=-1844.41
## AIC=3718.82   AICc=3721.34   BIC=3768.74
## 
## Training set error measures:
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -13.55273 1865.288 1119.117 -0.1820955 2.399079 0.4156704
##                      ACF1
## Training set -0.002801581

There are some advantages and disadvantages of the Fourier term approach to accounting for seasonality. The main advantage is that Fourier terms can handle long and complex seasonality. In fact, for multiple seasons in a dataset, we can add more Fourier variables at different frequencies to account for this. The disadvantages of the Fourier approach are that the Fourier terms themselves are not really interpretable and that we need to use trial and error to find the “right” amount of variables to use.

After removing the seasonality through deterministic solutions, the remaining error term (residuals) are modeled with seasonal ARIMA models. The key to these new seasonal ARIMA models is that there still might be effects at seasonal lags in the dataset, even though the main seasonality is accounted for.

Stochastic Solutions

The stochastic approach has been hinted at above. When a stochastic solution is best to solve seasonality, we need to take seasonal differences. A difference on a season is when we look at the difference between the current point and the same point in the previous season: YtYtS. For our dataset, it can be thought of as the year over year change in our data.

Here the ggtsdisplay function allows us to see the actual difference along with its correlation pattern with ACF and PACF plots. Notice, we are looking at these ACF and PACF plots with regards to the differences on the training data using the diff function with lag = 12 specifying the length of the difference being the same as our season.

training %>% diff(lag = 12) %>% ggtsdisplay()

After we remove the seasonal effect through stochastic approaches, the remaining differences are modeled with seasonal ARIMA models. The key to these new seasonal ARIMA models is that there still might be effects at seasonal lags in the dataset, even though the main seasonality is accounted for.

There are some limitations to differencing. Differencing is hard to evaluate for long and complex seasons due to the statistical tests for stochastic differencing typically ended at a season length of 24. Therefore, long and/or complex seasons are typically best approached with deterministic solutions. In fact, it is hard to imagine a difference being very long in terms of time points logically. For example, if you had daily data and thought you had an annual season, it is hard to imagine there is an actual impact of Jan 26 from last year on Jan 26 of this year.

Seasonal ARIMA Modeling

When extending the ARIMA model framework to the seasonal ARIMA framework, we add another set of terms - P, D, Q, and S. Notice how these terms are capitalized.

ARIMA(p,d,q)(P,D,Q)S

The terms represent the number of seasonal AR terms (P), the number of seasonal MA terms (Q), and the number of seasonal differences (D). The length of the season is still defined as S. Seasonal ARIMA models have the same structure and approach as typical ARIMA models with AR and MA patterns in the ACF and PACF. THe main difference is that the pattern is just on the seasonal lag instead of the individual lags. For example, if you had an ARIMA(0,0,0)(1,0,0)12) model, then the ACF would have an exponentially decreasing pattern every 12 lags while the PACF only had a single spike at lag 12. The pattern is the same, but it is only seen on the season since we are dealing with the seasonal lag. The opposite would be true for an ARIMA(0,0,0)(0,0,1)12) model. The PACF would have an exponentially decreasing pattern every 12 lags while the ACF only had a single spike at lag 12.

For our data we can try the ARIMA(1,0,0)(1,1,1)12 model using the Arima function. With either the ggtsdisplay function on the residuals, or the checkresiduals function on the model object, we can see that the model still doesn’t capture all of the signal and pattern in the dataset.

# Seasonal ARIMA 
training %>% 
  Arima(order=c(1,0,0), seasonal=c(1,1,1)) %>%
  residuals() %>% ggtsdisplay()

S.ARIMA <- Arima(training, order=c(1,0,0), seasonal=c(1,1,1))
summary(S.ARIMA)
## Series: training 
## ARIMA(1,0,0)(1,1,1)[12] 
## 
## Coefficients:
##          ar1    sar1    sma1
##       0.9056  0.0917  -0.672
## s.e.  0.0364  0.1091   0.093
## 
## sigma^2 estimated as 4126436:  log likelihood=-1763.94
## AIC=3535.88   AICc=3536.09   BIC=3548.97
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE       ACF1
## Training set 338.6503 1956.379 1156.221 0.5565257 2.418163 0.4294517 -0.2622466
checkresiduals(S.ARIMA)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,0)(1,1,1)[12]
## Q* = 45.934, df = 21, p-value = 0.001304
## 
## Model df: 3.   Total lags used: 24

We can still use the auto.arima function to select the “best” starting point model for us as well. We can use the original data in this function along with the seasonal = TRUE option to allow the function to take the seasonal difference for us.

S.ARIMA <- auto.arima(training, method="ML", seasonal = TRUE)
summary(S.ARIMA)
## Series: training 
## ARIMA(1,0,1)(0,1,1)[12] with drift 
## 
## Coefficients:
##          ar1      ma1     sma1     drift
##       0.8800  -0.2962  -0.6785  124.9788
## s.e.  0.0454   0.0950   0.0600   23.6330
## 
## sigma^2 estimated as 3639517:  log likelihood=-1751.67
## AIC=3513.34   AICc=3513.66   BIC=3529.7
## 
## Training set error measures:
##                     ME    RMSE     MAE        MPE     MAPE      MASE       ACF1
## Training set -4.332616 1832.54 1055.07 -0.1745474 2.217472 0.3918815 0.01300462
checkresiduals(S.ARIMA)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,1)(0,1,1)[12] with drift
## Q* = 21.957, df = 20, p-value = 0.3428
## 
## Model df: 4.   Total lags used: 24

Above we can see that the auto.arima function selected the ARIMA(1,0,1)(0,1,1)12 model. By using the checkresiduals function we see that we are left with white noise after our modeling. We still have a large outlier that we will have to account for in the next section on dynamic regression.

We can use the forecast function with h = 12 to forecast the next 12 observations in our dataset. Similar to the exponential smoothing model above, we plot the forecast as well as evaluate the MAE and MAPE from this model.

forecast::forecast(S.ARIMA, h = 12)
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Apr 2007       63889.23 61444.35 66334.12 60150.11 67628.36
## May 2007       65213.49 62382.40 68044.59 60883.71 69543.27
## Jun 2007       68462.20 65364.91 71559.48 63725.31 73199.09
## Jul 2007       71439.69 68151.01 74728.37 66410.09 76469.29
## Aug 2007       68975.83 65546.25 72405.40 63730.75 74220.90
## Sep 2007       57619.49 54084.66 61154.33 52213.43 63025.56
## Oct 2007       62959.30 59345.04 66573.56 57431.77 68486.83
## Nov 2007       61127.43 57452.84 64802.02 55507.63 66747.23
## Dec 2007       62364.86 58644.21 66085.50 56674.63 68055.09
## Jan 2008       58193.22 54437.30 61949.14 52449.04 63937.41
## Feb 2008       56375.46 52592.44 60158.48 50589.83 62161.08
## Mar 2008       68502.52 64698.65 72306.39 62685.00 74320.04
autoplot(forecast::forecast(S.ARIMA, h = 12)) + autolayer(fitted(S.ARIMA), series="Fitted") + 
  ylab("Airlines Passengers") + 
  geom_vline(xintercept = 2007.25,color="orange",linetype="dashed")

# Calculate prediction errors from forecast
S.ARIMA.error <- test - forecast::forecast(S.ARIMA, h = 12)$mean

# Calculate prediction error statistics (MAE and MAPE)
S.ARIMA.MAE <- mean(abs(S.ARIMA.error))
S.ARIMA.MAPE <- mean(abs(S.ARIMA.error)/abs(test))*100

S.ARIMA.MAE
## [1] 1229.206
S.ARIMA.MAPE
## [1] 1.885399

From the above output we see that our seasonal ARIMA did not beat the Holt-Winters exponential smoothing model in terms of MAE or MAPE. That outlier might be impacting the estimation of our model, so we will have to address that.

The default for the Arima function in R (which is what auto.arima is built off of) uses the multiplicative approach to seasonal ARIMA models. To get an additive seasonal ARIMA model instead we need to use the fixed = option in the Arima function to specify exactly what terms we want to estimate instead as shown below. The zeroes are for terms we do not want to estimate, while the NA values will be estimated by the function.

# Additive Seasonal ARIMA  
S.ARIMA <- Arima(training, order=c(1,0,13), seasonal=c(0,1,0),
                 fixed=c(NA,NA,0,0,0,0,0,0,0,0,0,0,NA,NA), method="ML",)
summary(S.ARIMA)
## Series: training 
## ARIMA(1,0,13)(0,1,0)[12] 
## 
## Coefficients:
##          ar1      ma1  ma2  ma3  ma4  ma5  ma6  ma7  ma8  ma9  ma10  ma11
##       0.9679  -0.3698    0    0    0    0    0    0    0    0     0     0
## s.e.  0.0237   0.0880    0    0    0    0    0    0    0    0     0     0
##          ma12    ma13
##       -0.6612  0.2490
## s.e.   0.0626  0.0766
## 
## sigma^2 estimated as 3792723:  log likelihood=-1755.53
## AIC=3521.07   AICc=3521.39   BIC=3537.43
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE    MAPE      MASE       ACF1
## Training set 216.2511 1870.713 1094.634 0.3140324 2.29433 0.4065768 0.02300017

Dynamic Regression Models

Predictor variables are used for variety of things. Previously we have seen them used for accounting for trend and for seasonality. However, we can also use external variables to help make our forecasts better. Variables such as holiday effects, sales promotions, economic factors, and changes in policy are just a few examples.

Predictor variables are incorporated in a regression and the time series component is applied to the errors from this regression model:

Yt=β0+β1X1++βkXk+Zt
The Zt in the above equation is where the time series (typically ARIMA) model is applied. This form of modeling has many names - dynamic regression, ARIMAX, transfer functions, just to name a few.

Intervention Variables

An intervention variable is a variable that contains discrete values that flag the occurrence of an event affecting the response series. These variables are used to model and forecast the series itself or analyze the impact of the specific event. For example, We can measure the impact of a previous sales promotion and forecast a future sales promotion’s impact. We add these discrete variables in models to adjust the intercept of the model during the events.

The three most common types of intervention variables are:

  • Point interventions
  • Shift interventions
  • Ramp interventions

A point intervention is typically denoted with a binary variable that flags when event occurs by taking a value of 1 with all other values set to zero. By putting this variable in our model, the coefficient on the intervention variable in the regression measures the estimated impact of that intervention.

A step intervention is typically denoted with a binary variable that flags when an event occurs as well as the time period that the effects of the event last. For example, if you have a change in policy, you would have 0 values for dates before the policy change and values of 1 for every date after the change in policy. By putting this variable in our model, the coefficient on the intervention variable in the regression measures the estimated impact of that intervention’s shift.

A ramp intervention is typically denoted by 0 values before an event and values that increase by 1 (1,2,3, etc.) starting with the event time point itself. By putting this variable in our model, the coefficient on the intervention variable in the regression measures the estimated slope of this new relationship after the event.

For our dataset we have a point intervention at September, 2001. We can easily create a binary variable that takes a value of 1 during that month and 0 otherwise. By using the xreg = option in the auto.arima function, we can add this intervention to our model as well as build out a seasonal ARIMA. We use the summary and checkresiduals functions to evaluate our model.

Sep11 <- rep(0, 207)
Sep11[141] <- 1

Full.ARIMA <- auto.arima(training, seasonal = TRUE, xreg = Sep11, method = "ML")
summary(Full.ARIMA)
## Series: training 
## Regression with ARIMA(2,0,0)(0,1,1)[12] errors 
## 
## Coefficients:
##          ar1     ar2     sma1     drift        xreg
##       0.7119  0.1369  -0.6626  125.6161  -11539.935
## s.e.  0.0709  0.0713   0.0579   22.6471    1150.961
## 
## sigma^2 estimated as 2442761:  log likelihood=-1712.17
## AIC=3436.35   AICc=3436.8   BIC=3455.99
## 
## Training set error measures:
##                     ME    RMSE      MAE        MPE     MAPE      MASE
## Training set -3.349235 1497.38 1030.214 -0.1299989 2.123044 0.3826493
##                     ACF1
## Training set 0.008315115
checkresiduals(Full.ARIMA)

## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(2,0,0)(0,1,1)[12] errors
## Q* = 21.697, df = 19, p-value = 0.2996
## 
## Model df: 5.   Total lags used: 24

We can see from the output above that our model now accounts for the outlier that we previously saw in our data. Notice how our model has changed slightly after accounting for this outlier. This will improve the forecasts going forward, even though we do not have any outliers in our validation dataset.

To forecast these values going forward, we need future values of our predictor variable. For this dataset, we have our Sep11 variable that we forecast to have values of 0 for each of the 12 observations we are forecasting for airline passengers. To do this we create another variable (of the same name) that has future values. This new variable is then put into the xreg = option inside of the forecast function. Notice how this forecast function comes from the forecast package with the input being the model we previously built. The forecast function is expecting a variable in the xreg = option with the same name as in our model.

Sep11 <- rep(0, 12)

forecast::forecast(Full.ARIMA, xreg = Sep11, h = 12)
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Apr 2007       63955.27 61952.29 65958.25 60891.98 67018.57
## May 2007       65316.30 62857.59 67775.01 61556.03 69076.57
## Jun 2007       68554.32 65778.00 71330.63 64308.31 72800.32
## Jul 2007       71525.80 68534.61 74516.99 66951.16 76100.43
## Aug 2007       68990.75 65846.48 72135.01 64182.01 73799.48
## Sep 2007       58181.07 54925.92 61436.22 53202.75 63159.39
## Oct 2007       63003.05 59666.52 66339.58 57900.27 68105.83
## Nov 2007       61187.86 57791.11 64584.60 55992.98 66382.73
## Dec 2007       62406.11 58964.54 65847.68 57142.69 67669.53
## Jan 2008       58243.83 54768.77 61718.89 52929.18 63558.48
## Feb 2008       56374.88 52874.71 59875.05 51021.83 61727.92
## Mar 2008       68567.34 65048.31 72086.37 63185.45 73949.23
autoplot(forecast::forecast(Full.ARIMA, xreg = Sep11, h = 12)) + autolayer(fitted(Full.ARIMA), series="Fitted") + 
  ylab("Airlines Passengers") + 
  geom_vline(xintercept = 2007.25,color="orange",linetype="dashed")

We can plot our forecasts just as we have done previously.

Predictor Variables

Most forecasting models also need to account for explanatory variables such as price, advertising spending, or income. These kinds of models can be called any of the following: dynamic regression, ARIMAX, transfer functions. Through the examples above, we have seen how to implement this in R.

However, there are often lagged variables (lags of the predictor variables) as well as (or instead of) immediate impacts of these variables. In other words, previous values of the predictor variables may still play a role in the current prediction of the target variable. The question becomes, how many lags of a variable need to be included in the model. There are multiple ways to evaluate how many lags of a predictor variable you need in a model.

The first, more theoretical approach, involves cross correlations and pre-whitening of series. This approach is time consuming, requires building a model for the predictor variables themselves, and is therefore best used for small numbers of predictor variables.

The second approach evaluates many difference combinations of lags of the predictor variable in many different models. These models are compared on a metric like AIC/BIC on a validation dataset to see which models the data best. This approach is more efficient, handles many variables much easier, and is similar in accuracy to the first approach.

For our model, the impact of September 11, 2001 was felt for many months (lag impacts). In the code below we add six lags as well as one seasonal lag (the first anniversary of September 11, 2001) to the model through the same xreg = option in auto.arima.

# Intervention Analysis with Lags
Sep11 <- rep(0, 207)
Sep11[141] <- 1

Sep11.L1 <- rep(0, 207)
Sep11.L1[142] <- 1

Sep11.L2 <- rep(0, 207)
Sep11.L2[143] <- 1

Sep11.L3 <- rep(0, 207)
Sep11.L3[144] <- 1

Sep11.L4 <- rep(0, 207)
Sep11.L4[145] <- 1

Sep11.L5 <- rep(0, 207)
Sep11.L5[146] <- 1

Sep11.L6 <- rep(0, 207)
Sep11.L6[147] <- 1

Anniv <- rep(0, 207)
Anniv[153] <- 1

Full.ARIMA <- auto.arima(training, seasonal = TRUE, xreg = cbind(Sep11, Sep11.L1, Sep11.L2, Sep11.L3, Sep11.L4, Sep11.L5, Sep11.L6, Anniv), method = "ML")
summary(Full.ARIMA)
## Series: training 
## Regression with ARIMA(2,0,0)(1,1,1)[12] errors 
## 
## Coefficients:
##          ar1     ar2    sar1    sma1     drift       Sep11    Sep11.L1
##       0.6298  0.2207  0.1926  -0.696  124.7562  -17400.420  -12116.115
## s.e.  0.0714  0.0726  0.1143   0.081   21.1622    1162.401    1271.324
##        Sep11.L2   Sep11.L3   Sep11.L4   Sep11.L5   Sep11.L6       Anniv
##       -8076.014  -7670.030  -4344.649  -2173.140  -749.6299  -2306.1784
## s.e.   1387.179   1427.366   1403.914   1271.271  1105.3247    998.2399
## 
## sigma^2 estimated as 1736410:  log likelihood=-1673.71
## AIC=3375.42   AICc=3377.75   BIC=3421.24
## 
## Training set error measures:
##                    ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 1.076103 1235.596 944.9564 -0.0820269 1.937634 0.3509825
##                     ACF1
## Training set 0.007704655
checkresiduals(Full.ARIMA)

## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(2,0,0)(1,1,1)[12] errors
## Q* = 16.046, df = 11, p-value = 0.1394
## 
## Model df: 13.   Total lags used: 24

From the output above, we see that the model produces white noise and does not appear to have any more intervention impacts in the dataset.

Forecasting in time series with only lagged values of the target variable is easy because of the recursive nature of the function - one prediction feeds into the next prediction. Forecasting with external variables is much trickier. For intervention variables it is easier since we should know future holiday, promotion, etc. values. However, some variables need forecasting. We could either use external estimate of future values or we might need to forecast future values ourselves.

To forecast this series, we need to have future values of these input variables. Luckily, for our variables, these are just 0 values. We still need to use the forecast function from the forecast package. This function also has an xreg = option where we need future values of these predictor variables. Ideally, these future values have the same name as the original predictor variables for the forecast function to work best. We can plot these updated forecasts as well as calculate the MAE and MAPE.

Sep11 <- rep(0, 12)
Sep11.L1 <- rep(0, 12)
Sep11.L2 <- rep(0, 12)
Sep11.L3 <- rep(0, 12)
Sep11.L4 <- rep(0, 12)
Sep11.L5 <- rep(0, 12)
Sep11.L6 <- rep(0, 12)
Anniv <- rep(0, 12)

forecast::forecast(Full.ARIMA, xreg = cbind(Sep11, Sep11.L1, Sep11.L2, Sep11.L3, Sep11.L4, Sep11.L5, Sep11.L6, Anniv), h = 12)
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Apr 2007       63937.46 62248.72 65626.20 61354.76 66520.17
## May 2007       65317.01 63321.26 67312.76 62264.77 68369.25
## Jun 2007       68441.10 66189.44 70692.76 64997.48 71884.72
## Jul 2007       71239.57 68817.90 73661.25 67535.94 74943.20
## Aug 2007       68660.79 66113.06 71208.53 64764.37 72557.22
## Sep 2007       58377.67 55736.78 61018.56 54338.78 62416.56
## Oct 2007       63394.60 60683.61 66105.60 59248.49 67540.71
## Nov 2007       61550.88 58786.78 64314.98 57323.55 65778.20
## Dec 2007       62572.13 59767.54 65376.71 58282.89 66861.37
## Jan 2008       58447.60 55612.04 61283.17 54110.98 62784.23
## Feb 2008       56242.81 53383.45 59102.17 51869.80 60615.82
## Mar 2008       68562.05 65684.39 71439.72 64161.04 72963.06
autoplot(forecast::forecast(Full.ARIMA, xreg = cbind(Sep11, Sep11.L1, Sep11.L2, Sep11.L3, Sep11.L4, Sep11.L5, Sep11.L6, Anniv), h = 12)) + autolayer(fitted(Full.ARIMA), series="Fitted") + 
  ylab("Airlines Passengers") + 
  geom_vline(xintercept = 2007.25,color="orange",linetype="dashed")

# Calculate prediction errors from forecast
Full.ARIMA.error <- test - forecast::forecast(Full.ARIMA, xreg = cbind(Sep11, Sep11.L1, Sep11.L2, Sep11.L3, Sep11.L4, Sep11.L5, Sep11.L6, Anniv), h = 12)$mean

# Calculate prediction error statistics (MAE and MAPE)
Full.ARIMA.MAE <- mean(abs(Full.ARIMA.error))
Full.ARIMA.MAPE <- mean(abs(Full.ARIMA.error)/abs(test))*100

Full.ARIMA.MAE
## [1] 1180.991
Full.ARIMA.MAPE
## [1] 1.800255

From the above output we see that although the MAE and MAPE have improved from the ARIMA model without accounting for September 11, but still not better than the exponential smoothing model.

Prophet

Prophet is a model framework introduced to the public by Facebook in 2018. Facebook uses this algorithm to forecast univariate time series by decomposing the it into pieces. This is similar to exponential smoothing, ETS, etc. In the Prophet model structure, the signal is broken down into three pieces - growth/trend, season, holiday.

The growth/trend component uses trend lines (time) as variables in the model. This trend is a piecewise trend that brakes the pattern into different pieces using knots to change the slope. The user of the algorithm can specify the knots, or the algorithm will try to automatically select them. This trend can also be a logarithmic trend that is similar in design to the dampened trend approach to exponential smoothing.

The seasonal component consists of Fourier variables to account for the seasonal pattern. The algorithm was originally designed for daily data with weekly and yearly seasonal effects. This can be expanded though to handle different types of data and seasons. The yearly season is set to 10 Fourier variables by default:

XY=cos(2πt365.25)+sin(2×2πt365.25)++sin(10×2πt365.25)
The weekly season is set to three Fourier variables by default:

XW=cos(2πt7)+sin(2×2πt7)+cos(3×2πt7)

The last component is the holiday component that consists of binary dummy variables to flag holidays in the dataset.

The prophet package contains all the functions needed for the Prophet algorithm, but it is in a different format than the previous R time series functions. To start, we define our own “holidays” in the dataset. This list is a list of dates that you want flagged with binary variables. For our dataset, this would be September, 2001 as well as its 6 lags and the one seasonal lag (anniversary). These must be in a data frame (using the data.frame function) as date objects. The as.Date function will do this. This data frame needs to have specific naming conventions. The variable name is called holiday while the date variable is called ds. Next, the training data needs a time variable also called ds.

Once the dataset is structured and ready, the prophet function can be employed. If we didn’t have our own “holidays” to add to the algorithm, we would just use prophet() to activate the prophet model framework. However, we use the holidays = holidays option to input our own additional holidays. The add_country_holidays function is used to add pre-programmed holidays by country. We use the US option to specify US holidays. Next, we use the add_seasonlity function to add our own monthly seasonality. The name = 'monthly' option along with the period = 30.5 defines the monthly season with a period of 30.5 days per season. We can also specify the number of Fourier variables with the fourier.order option. Now that we have the structure created, we fit the algorithm using the fit.prophetfunction on our model object Prof and dataset prophet.data.

# Prophet
holidays <- data.frame(
  holiday = 'Sep11',
  ds = as.Date(c('2001-09-01', '2001-10-01', '2001-11-01',
                 '2001-12-01', '2002-01-01', '2002-02-01',
                 '2002-09-01')),
  lower_window = 0,
  upper_window = 1
)
prophet.data <- data.frame(ds = seq(as.Date('1990-01-01'), as.Date('2007-03-01'), by = 'm'), y = training)

Prof <- prophet(holidays = holidays)
Prof <- add_country_holidays(Prof, "US")
Prof <- add_seasonality(Prof, name='monthly', period=30.5, fourier.order=6)
Prof <- fit.prophet(Prof, prophet.data)
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044

Now that the model is created, we can forecast our dataset with the make_future_dataframe function with the periods = 12 and freq = 'month' options to specify that we want 12 months of forecasts. This function structures the dataset (inputs) needed to forecast. The predict function actually does the forecasting. We can plot this forecast with the plot function and calculate our test dataset MAE and MAPE as well.

forecast.data <- make_future_dataframe(Prof, periods = 12, freq = 'month')
predict(Prof, forecast.data)$yhat
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044

## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
##   [1] 34243.76 32495.22 38101.67 36946.97 36975.28 40873.20 42424.04 43324.30
##   [9] 35560.23 37836.91 36621.77 36764.24 33790.00 34062.55 38204.16 36368.54
##  [17] 37427.79 40037.22 42637.44 42478.52 34665.00 37211.01 35306.58 36104.98
##  [25] 32147.97 33338.65 42186.88 39335.63 40361.73 43754.04 46299.78 46256.07
##  [33] 36883.19 40290.49 38767.65 39313.00 36637.89 35733.79 43993.84 40868.45
##  [41] 42549.45 45082.83 48279.29 47592.35 37742.47 41910.31 39252.29 40910.26
##  [49] 36793.29 35568.06 45705.08 43253.50 44616.84 47217.88 50103.76 49707.14
##  [57] 40298.61 44257.91 41398.31 43223.39 38703.98 37464.74 46760.78 44882.33
##  [65] 46022.57 48589.17 51270.38 51068.62 42611.82 45839.27 43291.56 44771.40
##  [73] 40270.39 40037.01 50918.45 47996.47 49076.57 52398.14 54997.88 54883.44
##  [81] 44839.38 48901.12 46694.19 47881.48 44621.57 42280.50 52169.55 49182.42
##  [89] 50670.29 53335.46 56338.80 55783.64 45154.18 50040.03 47577.52 48978.19
##  [97] 45156.01 43737.86 53833.48 51117.84 52683.43 55020.42 58108.55 57447.88
## [105] 47925.97 51936.84 48963.78 50840.35 46207.48 45099.70 54158.09 52680.33
## [113] 53357.91 56323.24 58539.88 58736.86 50175.26 53441.67 50789.16 52307.96
## [121] 47702.15 47230.55 57978.58 55483.21 56125.73 59873.92 62036.09 62348.25
## [129] 52227.29 56354.98 54071.15 55324.39 51987.57 49408.29 56523.72 55091.60
## [137] 55019.95 59240.13 60683.94 61683.80 42578.61 45401.09 44022.57 44335.20
## [145] 41597.00 40608.98 55731.99 53909.94 54577.90 57808.47 59998.97 60231.88
## [153] 41707.48 54716.79 53276.13 53616.42 50515.95 50963.50 57578.32 54751.95
## [161] 56774.26 58392.94 61956.21 60806.54 51070.73 55511.34 52659.94 54377.62
## [169] 49572.92 49996.96 61394.87 57552.87 59542.02 61943.58 65452.38 64417.91
## [177] 54098.06 58424.64 55941.92 57394.05 53858.34 52174.70 64058.38 60577.75
## [185] 62554.62 64726.27 68218.61 67169.95 56529.70 61422.04 57973.67 60356.15
## [193] 55548.10 53213.44 64919.11 62762.80 63765.02 66661.34 69186.09 69084.74
## [201] 59776.95 63569.66 60810.80 62469.29 58050.62 56449.65 67239.55 63788.43
## [209] 66435.48 67429.41 71617.44 69843.01 61233.17 64547.81 61847.06 63414.09
## [217] 58760.04 58849.83 71056.10
plot(Prof, predict(Prof, forecast.data))
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044

## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044

# Calculate prediction errors from forecast
Prophet.error <- test - tail(predict(Prof, forecast.data)$yhat, 12)
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044

## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
# Calculate prediction error statistics (MAE and MAPE)
Prophet.MAE <- mean(abs(Prophet.error))
Prophet.MAPE <- mean(abs(Prophet.error)/abs(test))*100

Prophet.MAE
## [1] 1449.851
Prophet.MAPE
## [1] 2.252305

Above we see that the Prophet algorithm doesn’t perform as well as the previous algorithms we have tried. This should not be as surprising as the algorithm is just curve-fitting and does not use any lag values of the target variable. The forecast extends this curve into the future.

Neural Network Models

Neural network models are models based on mathematical models of how brains function. They are organized in a network of neurons through layers. The input variables are considered the neurons in the bottom layer. The output variable is the final neuron in the top layer. The layers in between, called hidden layers, transform the input variables through non-linear methods to try and best model the output variable.

From one layer to the next, the neurons from the previous layer are inputs into each neuron of the next layer all linear combined with weights (coefficients). Inside each of the hidden layer neurons, the linear combination of inputs is transformed with a non-linear function. Typically, this function is the logit function, but could be a variety of non-linear transformations. This is then output from each neuron to go into the next hidden layer. The top layer is just a linear combination of all of the transformations of the neurons from the previous hidden layer.

In autoregressive neural networks the bottom layer (input variables) also contains lags of the target variable as possible inputs. The number of lags to include is typically done through trial and error. We can approach this problem similar to ARIMA modeling with correlation plots and automatic selection techniques. However, there are no MA terms in this autoregressive neural networks. For seasonal data we typically include all lags up through one season unless correlation plots say you only need specific ones. We do however need to make our data stationary first since we are dealing with lagged terms of the target variable.

To build an autoregressive neural network in R, we use the nnetar function. This function cannot take differences of your data so we need to input our differenced training data. The p = and P = options specify the number of normal and seasonal autoregressive lags respectively. From there we just use the forecast function from the forecast package to forecast the next 12 time periods (h = 12 option).

set.seed(12345)
NN.Model <- nnetar(diff(training, 12), p = 2, P = 3)

NN.Forecast <- forecast::forecast(NN.Model, h = 12)
plot(NN.Forecast)

However, the nnetar function only forecasts the input. Since our input was the differences, the function forecasts the differences, not the original data. The for loop in the code just calculates the forecasts of the actual data based on these differences. Essentially, to get the first month’s forecast, we just add the first forecasted difference to the last data point in our training dataset. Lastly, we can calculate the MAE and MAPE from these forecasts.

Pass.Forecast <- rep(NA, 12)

for(i in 1:12){
  Pass.Forecast[i] <- training[length(training) - 12 + i] + forecast::forecast(NN.Model, h = 24)$mean[i]
}

Pass.Forecast <- ts(Pass.Forecast, start = c(2007, 4), frequency = 12)

plot(training, main = "US Airline Passengers ARIMA Model Forecasts", xlab = "Date", ylab = "Passengers (Thousands)", xlim = c(1990, 2009), ylim = c(30000,80000))
lines(Pass.Forecast, col = "blue")
abline(v = 2007.25, col = "red", lty = "dashed")

# Calculate prediction errors from forecast
NN.error <- test - Pass.Forecast

# Calculate prediction error statistics (MAE and MAPE)
NN.MAE <- mean(abs(NN.error))
NN.MAPE <- mean(abs(NN.error)/abs(test))*100

NN.MAE
## [1] 1087.848
NN.MAPE
## [1] 1.667773

Finally, we have a model that beats the original exponential smoothing model’s forecasts! Before any problem begins, we do not know which algorithm will win so we try them all.

Weighted and Combined Models

The thought process around weighted and combined forecasting (also called composite forecasting) is not new. The topic has been studied for years, and empirical evidence indicates that the combination of forecast methods tends to outperform most single forecast methods.

It is better to average forecasts in hope that by so doing, the biases among the methods and/or forecasters will compensate for one another. As a result, forecasts developed using different methods are expected to be more useful in cases of high uncertainty. The method is especially relevant for long-range forecasting, where uncertainty is extremely high. Many different studies have been done to show the value of combining forecasts.

There are two basic weighting methods:

  1. Simple Averaging of Forecasts
  2. Weighted Methods Based on Error

The simple averaging approach is just as easy as it sounds. It takes the average of the forecasted values from each of the models being combined.

The code below just takes the average of the four models we have been working with for this dataset - the Holt-Winters exponential smoothing model, the dynamic regression ARIMA model, the neural network model, and Prophet algorithm model. We then calculate the MAE and MAPE from this model.

For.Avg <- (HWES.USAir.train$mean + 
            forecast::forecast(Full.ARIMA, xreg = cbind(Sep11, Sep11.L1, Sep11.L2, Sep11.L3, Sep11.L4, Sep11.L5, Sep11.L6, Anniv), h = 12)$mean + 
            Pass.Forecast +
            tail(predict(Prof, forecast.data)$yhat, 12))/4
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044

## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
Avg.error <- test - For.Avg

Avg.MAE <- mean(abs(Avg.error))
Avg.MAPE <- mean(abs(Avg.error)/abs(test))*100

Avg.MAE
## [1] 1046.492
Avg.MAPE
## [1] 1.586614

This average of the four models performs better than any of the individual models. However, maybe not all of the models are needed in this average. In fact, trying out many different combinations of models shows that only the exponential smoothing, neural network, and Prophet model are needed.

For.Avg <- (HWES.USAir.train$mean + 
              Pass.Forecast +
              tail(predict(Prof, forecast.data)$yhat, 12)
            )/3
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044

## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
Avg.error <- test - For.Avg

Avg.MAE <- mean(abs(Avg.error))
Avg.MAPE <- mean(abs(Avg.error)/abs(test))*100

Avg.MAE
## [1] 1001.659
Avg.MAPE
## [1] 1.5154

The MAE and MAPE are lower than any of the individual models as well as the previous average across all models.

The second method uses weighted averaging instead of simple averaging. In weighted averaging, the analysts select the weights to assign to the individual forecasts. Typically, we will assign weights that minimize the variance of the forecast errors over the forecasted period. With only two forecasts, there is an easily derived mathematical solution to what these weights should be to minimize variance. However, three or more forecasts being combined makes the mathematical equations much harder. In these scenarios we can use linear regression to help. We run a regression of the actual values of Y against the forecasted values of Y to let the regression choose the optimal weight for each. To make sure that the weights sum to one, we can use some algebra. By setting the weight (coefficients) of one of the models to one using the offset function, we can then use the I function to estimate the weights (coefficients) on the differences between the remaining models and the offset model. By working out the math algebraically, we can see that each model in an I function gets that weight, but the original model in the offset function gets the weight of 1 minus the remaining weights.

Pass.Fit.NN <- rep(NA, 207)

for(i in 25:207){
  Pass.Fit.NN[i] <- training[i - 24] + NN.Model$fitted[i - 12]
}

Pass.Fit.ARIMA <- Full.ARIMA$fitted
Pass.Fit.HWES <- HWES.USAir.train$fitted
Pass.Fit.Prophet <- head(predict(Prof, forecast.data)$yhat, 207)
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044

## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
WC.Model <- lm(training ~ offset(Pass.Fit.HWES) + I(Pass.Fit.ARIMA - Pass.Fit.HWES) + I(Pass.Fit.NN - Pass.Fit.HWES) + I(Pass.Fit.Prophet - Pass.Fit.HWES) - 1)
summary(WC.Model)
## 
## Call:
## lm(formula = training ~ offset(Pass.Fit.HWES) + I(Pass.Fit.ARIMA - 
##     Pass.Fit.HWES) + I(Pass.Fit.NN - Pass.Fit.HWES) + I(Pass.Fit.Prophet - 
##     Pass.Fit.HWES) - 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3153.9  -741.1   -47.0   731.0  3826.9 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## I(Pass.Fit.ARIMA - Pass.Fit.HWES)    0.74970    0.06973  10.751  < 2e-16 ***
## I(Pass.Fit.NN - Pass.Fit.HWES)      -0.02450    0.02483  -0.986    0.325    
## I(Pass.Fit.Prophet - Pass.Fit.HWES)  0.28999    0.07251   3.999 9.79e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1178 on 156 degrees of freedom
##   (48 observations deleted due to missingness)
## Multiple R-squared:  0.9995, Adjusted R-squared:  0.9995 
## F-statistic: 1.097e+05 on 3 and 156 DF,  p-value: < 2.2e-16
ARIMA.coef <- coef(WC.Model)[1]
NN.coef <- coef(WC.Model)[2]
Prophet.coef <- coef(WC.Model)[3]
HW.coef <- 1 - ARIMA.coef - NN.coef - Prophet.coef

Now we can just combine the forecasts with their respective weights to get our final forecast. From there we can evaluate the MAE and MAPE for this minimum variance weighted average forecast.

For.W.Avg <- HW.coef*HWES.USAir.train$mean + 
             ARIMA.coef*forecast::forecast(Full.ARIMA, xreg = cbind(Sep11, Sep11.L1, Sep11.L2, Sep11.L3, Sep11.L4, Sep11.L5, Sep11.L6, Anniv), h = 12)$mean + 
             NN.coef*Pass.Forecast +
             Prophet.coef*tail(predict(Prof, forecast.data)$yhat, 12)
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044

## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
W.Avg.error <- test - For.W.Avg

W.Avg.MAE <- mean(abs(W.Avg.error))
W.Avg.MAPE <- mean(abs(W.Avg.error)/abs(test))*100

W.Avg.MAE
## [1] 1068.435
W.Avg.MAPE
## [1] 1.605295

Unfortunately, just because we have the minimum variance on the training data, doesn’t mean that we are good on the test dataset. In fact, in practice, the regular average of forecasts is very hard to beat.